patient data
FHIR-AgentBench: Benchmarking LLM Agents for Realistic Interoperable EHR Question Answering
Lee, Gyubok, Bach, Elea, Yang, Eric, Pollard, Tom, Johnson, Alistair, Choi, Edward, jia, Yugang, Lee, Jong Ha
The recent shift toward the Health Level Seven Fast Healthcare Interoperability Resources (HL7 FHIR) standard opens a new frontier for clinical AI, demanding LLM agents to navigate complex, resource-based data models instead of conventional structured health data. However, existing benchmarks have lagged behind this transition, lacking the realism needed to evaluate recent LLMs on interoperable clinical data. To bridge this gap, we introduce FHIR-AgentBench--a benchmark that grounds 2,931 real-world clinical questions in the HL7 FHIR standard. Using this benchmark, we systematically evaluate agentic frameworks, comparing different data retrieval strategies (direct FHIR API calls vs. specialized tools), interaction patterns (single-turn vs. multi-turn), and reasoning strategies (natural language vs. code generation). Our experiments highlight the practical challenges of retrieving data from intricate FHIR resources and the difficulty of reasoning over them--both of which critically affect question answering performance.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Asia > South Korea (0.04)
- Research Report > Experimental Study (0.48)
- Research Report > New Finding (0.46)
- Health & Medicine > Health Care Technology > Medical Record (0.95)
- Health & Medicine > Diagnostic Medicine (0.88)
Federated Deep Reinforcement Learning for Privacy-Preserving Robotic-Assisted Surgery
Hafeez, Sana, Mulkana, Sundas Rafat, Imran, Muhammad Ali, Sevegnani, Michele
The integration of Reinforcement Learning (RL) into robotic-assisted surgery (RAS) holds significant promise for advancing surgical precision, adaptability, and autonomous decision-making. However, the development of robust RL models in clinical settings is hindered by key challenges, including stringent patient data privacy regulations, limited access to diverse surgical datasets, and high procedural variability. To address these limitations, this paper presents a Federated Deep Reinforcement Learning (FDRL) framework that enables decentralized training of RL models across multiple healthcare institutions without exposing sensitive patient information. A central innovation of the proposed framework is its dynamic policy adaptation mechanism, which allows surgical robots to select and tailor patient-specific policies in real-time, thereby ensuring personalized and Optimised interventions. To uphold rigorous privacy standards while facilitating collaborative learning, the FDRL framework incorporates secure aggregation, differential privacy, and homomorphic encryption techniques. Experimental results demonstrate a 60\% reduction in privacy leakage compared to conventional methods, with surgical precision maintained within a 1.5\% margin of a centralized baseline. This work establishes a foundational approach for adaptive, secure, and patient-centric AI-driven surgical robotics, offering a pathway toward clinical translation and scalable deployment across diverse healthcare environments.
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Surgery (1.00)
- Health & Medicine > Health Care Technology (1.00)
A systematic review of trial-matching pipelines using large language models
Morrison, Braxton A., Sushil, Madhumita, Young, Jacob S.
Matching patients to clinical trial options is critical for identifying novel treatments, especially in oncology. However, manual matching is labor-intensive and error-prone, leading to recruitment delays. Pipelines incorporating large language models (LLMs) offer a promising solution. We conducted a systematic review of studies published between 2020 and 2025 from three academic databases and one preprint server, identifying LLM-based approaches to clinical trial matching. Of 126 unique articles, 31 met inclusion criteria. Reviewed studies focused on matching patient-to-criterion only (n=4), patient-to-trial only (n=10), trial-to-patient only (n=2), binary eligibility classification only (n=1) or combined tasks (n=14). Sixteen used synthetic data; fourteen used real patient data; one used both. Variability in datasets and evaluation metrics limited cross-study comparability. In studies with direct comparisons, the GPT-4 model consistently outperformed other models, even finely-tuned ones, in matching and eligibility extraction, albeit at higher cost. Promising strategies included zero-shot prompting with proprietary LLMs like the GPT-4o model, advanced retrieval methods, and fine-tuning smaller, open-source models for data privacy when incorporation of large models into hospital infrastructure is infeasible. Key challenges include accessing sufficiently large real-world data sets, and deployment-associated challenges such as reducing cost, mitigating risk of hallucinations, data leakage, and bias. This review synthesizes progress in applying LLMs to clinical trial matching, highlighting promising directions and key limitations. Standardized metrics, more realistic test sets, and attention to cost-efficiency and fairness will be critical for broader deployment.
- North America > United States > California > San Francisco County > San Francisco (0.29)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
T-SYNTH: A Knowledge-Based Dataset of Synthetic Breast Images
Wiedeman, Christopher, Sarmakeeva, Anastasiia, Sizikova, Elena, Filienko, Daniil, Lago, Miguel, Delfino, Jana G., Badano, Aldo
Responsible for approximately two million new cases and over six hundred thousand deaths in 2022 alone (Sung et al., 2021), breast cancer remains a prominent global health concern, and is expected to account nearly one-third of all newly diagnosed cancers among women in the United States (DeSantis et al., 2016). According to the most recent report from International Agency for Research on Cancer (Bray et al., 2024), it is one of the most widespread cancers diagnosed worldwide, both in the number of cases and associated deaths. Consequently, medical imaging techniques are indispensable for screening, diagnosis, and further research into the disease. Historically, the most common imaging technique for breast cancer screening is digital mammography (DM), in which a 2D x-ray projection of a compressed breast is taken. Digital breast tomosynthesis (DBT), a pseudo-3D imaging technique, has been increasingly adopted, demonstrating improved screening performance (Asbeutah et al., 2019; Sprague et al., 2023).
- North America > United States > Virginia (0.04)
- North America > United States > Vermont (0.04)
- North America > United States > Maryland > Montgomery County > Silver Spring (0.04)
- (5 more...)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.92)
FHIR-RAG-MEDS: Integrating HL7 FHIR with Retrieval-Augmented Large Language Models for Enhanced Medical Decision Support
Kabak, Yildiray, Erturkmen, Gokce B. Laleci, Gencturk, Mert, Namli, Tuncay, Sinaci, A. Anil, Corcoles, Ruben Alcantud, Ballesteros, Cristina Gomez, Abizanda, Pedro, Dogac, Asuman
In recent years, the field of medical informatics has seen significant advancements with the introduction of medical large language models (LLMs). These models, powered by artificial intelligence, have demonstrated remarkable capabilities in understanding and generating medical text, providing valuable assistance in clinical decision - making, diagnostics, and patient care. Prom inent examples include models such as Meditron [1], BioMistral [2] and OpenBioLLM [3], which have shown considerable promise in various medical applications. However, despite these advancements, the inherent limitations of medical LLMs highlight the need for more robust solutions.
- North America > United States (0.04)
- Europe > Spain > Castilla-La Mancha > Albacete Province > Albacete (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (3 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
Mitigating Clinician Information Overload: Generative AI for Integrated EHR and RPM Data Analysis
Shetgaonkar, Ankit, Pradhan, Dipen, Arora, Lakshit, Girija, Sanjay Surendranath, Kapoor, Shashank, Raj, Aman
Generative Artificial Intelligence (GenAI), particularly Large Language Models (LLMs), offer powerful capabilities for interpreting the complex data landscape in healthcare. In this paper, we present a comprehensive overview of the capabilities, requirements and applications of GenAI for deriving clinical insights and improving clinical efficiency. We first provide some background on the forms and sources of patient data, namely real-time Remote Patient Monitoring (RPM) streams and traditional Electronic Health Records (EHRs). The sheer volume and heterogeneity of this combined data present significant challenges to clinicians and contribute to information overload. In addition, we explore the potential of LLM-powered applications for improving clinical efficiency. These applications can enhance navigation of longitudinal patient data and provide actionable clinical decision support through natural language dialogue. We discuss the opportunities this presents for streamlining clinician workflows and personalizing care, alongside critical challenges such as data integration complexity, ensuring data quality and RPM data reliability, maintaining patient privacy, validating AI outputs for clinical safety, mitigating bias, and ensuring clinical acceptance. We believe this work represents the first summarization of GenAI techniques for managing clinician data overload due to combined RPM / EHR data complexities.
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Asia > China (0.04)
- Overview (1.00)
- Research Report > Experimental Study (0.46)
An AI Model for the Brain Is Coming to the ICU
The Cleveland Clinic is partnering with San Francisco-based startup Piramidal to develop a large-scale AI model that will be used to monitor patients' brain health in intensive care units. Instead of being trained on text, the system is based on electroencephalogram (EEG) data, which is collected via electrodes placed on the scalp and then read out by a computer in a series of wavy lines. EEG records the brain's electrical activity--and changes in this activity can indicate a problem. In an ICU setting, doctors scan EEG data looking for evidence of seizures, altered consciousness, or a decline in brain function. Currently, doctors rely on continuous EEG monitoring to detect abnormal brain activity in an ICU patient, but they can't monitor every individual patient in real time.
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Providers & Services (1.00)
Cross-patient Seizure Onset Zone Classification by Patient-Dependent Weight
Zhao, Xuyang, Sugano, Hidenori, Tanaka, Toshihisa
Identifying the seizure onset zone (SOZ) in patients with focal epilepsy is essential for surgical treatment and remains challenging due to its dependence on visual judgment by clinical experts. The development of machine learning can assist in diagnosis and has made promising progress. However, unlike data in other fields, medical data is usually collected from individual patients, and each patient has different illnesses, physical conditions, and medical histories, which leads to differences in the distribution of each patient's data. This makes it difficult for a machine learning model to achieve consistently reliable performance in every new patient dataset, which we refer to as the "cross-patient problem." In this paper, we propose a method to fine-tune a pretrained model using patient-specific weights for every new test patient to improve diagnostic performance. First, the supervised learning method is used to train a machine learning model. Next, using the intermediate features of the trained model obtained through the test patient data, the similarity between the test patient data and each training patient's data is defined to determine the weight of each training patient to be used in the following fine-tuning. Finally, we fine-tune all parameters in the pretrained model with training data and patient weights. In the experiment, the leave-one-patient-out method is used to evaluate the proposed method, and the results show improved classification accuracy for every test patient, with an average improvement of more than 10%.
- Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.95)
- Health & Medicine > Therapeutic Area > Genetic Disease (0.95)
Leveraging Generative AI to Enhance Synthea Module Development
Kramer, Mark A., Mathur, Aanchal, Adams, Caroline E., Walonoski, Jason A.
This paper explores the use of large language models (LLMs) to assist in the development of new disease modules for Synthea, an open-source synthetic health data generator. Incorporating LLMs into the module development process has the potential to reduce development time, reduce required expertise, expand model diversity, and improve the overall quality of synthetic patient data. We demonstrate four ways that LLMs can support Synthea module creation: generating a disease profile, generating a disease module from a disease profile, evaluating an existing Synthea module, and refining an existing module. We introduce the concept of progressive refinement, which involves iteratively evaluating the LLM-generated module by checking its syntactic correctness and clinical accuracy, and then using that information to modify the module. While the use of LLMs in this context shows promise, we also acknowledge the challenges and limitations, such as the need for human oversight, the importance of rigorous testing and validation, and the potential for inaccuracies in LLM-generated content. The paper concludes with recommendations for future research and development to fully realize the potential of LLM-aided synthetic data creation.
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Internal Medicine (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology (1.00)
- (10 more...)
Decentralized AI-driven IoT Architecture for Privacy-Preserving and Latency-Optimized Healthcare in Pandemic and Critical Care Scenarios
Sammangi, Harsha, Jagatha, Aditya, Bojja, Giridhar Reddy, Liu, Jun
AI Innovations in the IoT for Real-Time Patient Monitoring On one hand, the current traditional centralized healthcare architecture poses numerous issues, including data privacy, delay, and security. Here, we present an AI-enabled decentralized IoT architecture that can address such challenges during a pandemic and critical care settings. This work presents our architecture to enhance the effectiveness of the current available federated learning, blockchain, and edge computing approach, maximizing data privacy, minimizing latency, and improving other general system metrics. Experimental results demonstrate transaction latency, energy consumption, and data throughput orders of magnitude lower than competitive cloud solutions.